Regression testing for wrapper maintenance
نویسنده
چکیده
Recent work on Internet information integration ~sumes a library of wrappers, specialized information extraction procedures. Maintaining wrappers is difficult, because the formatting regularities on which they rely often change. The wrapper verification problem is to determine whether a wrapper is correct. Standard regression testing approaches are inappropriate, because both the formatting regularities and a site’s underlying content may change. We introduce RAPTURE, a fully-implemented, domain-independent verification algorithm. RAPTURE uses well-motivated heuristics to compute the similarity between a wrapper’s expected and observed output. Experiments with 27 actual Internet sites show a substantial performance improvement over standard regression testing.
منابع مشابه
Automatic Wrapper Generation and Maintenance
This paper investigates automatic wrapper generation and maintenance for Forums, Blogs and News web sites. Web pages are increasingly dynamically generated using a common template populated with data from databases. This paper proposes a novel method that uses tree alignment and transfer learning method to generate the wrapper from this kind of web pages. The tree alignment algorithm is adopted...
متن کاملWrapper Veriication
Many Internet information-management applications (e.g., information integration systems) require a library of wrappers, specialized information extraction procedures that translate a source's native format into a structured representation suitable for further application-speci c processing. Maintaining wrappers is tedious and error-prone, because the formatting regularities on which wrappers r...
متن کاملWrapper Maintenance: A Machine Learning Approach
The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wr...
متن کاملAn Approach to Cost Effective Regression Testing in Black-Box Testing Environment
Regression testing is an expensive and frequently executed maintenance activity used to revalidate the modified software. As the regression testing is a frequently executed activity in the software maintenance phase, it occupies a large portion of the software maintenance budget. Any reduction in the cost of regression testing would help to reduce the software maintenance cost. The current rese...
متن کاملRegression Testing Cost Reduction Suite
The estimated cost of software maintenance exceeds 70 percent of total software costs [1], and large portion of this maintenance expenses is devoted to regression testing. Regression testing is an expensive and frequently executed maintenance activity used to revalidate the modified software. Any reduction in the cost of regression testing would help to reduce the software maintenance cost. Tes...
متن کامل